1 Time Series : CRYPTOCURRENCY

library(knitr)
knitr::include_graphics("cryptocurrency_image1_1130490519_670x377px_300dpi.jpg")

Img Source : https://www.kaspersky.com/content/en-global/images/repository/isc/2021/cryptocurrency_image1_1130490519_670x377px_300dpi.jpg

link: https://www.kaggle.com/datasets/jessevent/all-crypto-currencies/data

Cryptocurrency is a form of digital or virtual currency that relies on cryptographic techniques to secure financial transactions, control the creation of new units, and verify the transfer of assets. Unlike traditional currencies issued by governments, cryptocurrencies operate on decentralized networks based on blockchain technology. A blockchain is a distributed ledger that records all transactions across a network of computers. One of the key features of cryptocurrencies is decentralization, meaning they are not controlled by any central authority such as a government or financial institution. Bitcoin, created in 2009, was the first decentralized cryptocurrency, and since then, numerous other cryptocurrencies, often referred to as altcoins, have been developed. Cryptocurrencies offer the potential for increased financial privacy, lower transaction fees, and borderless transactions, but they also pose challenges such as regulatory concerns, volatility, and security risks.

As we explore the world of cryptocurrency and its decentralized nature, the application of time series analysis emerges as a valuable tool. By delving into historical trends and behaviors of digital assets, we can harness this approach to not only understand the past but also predict potential future market dynamics, providing a practical means for navigating the complexities of decentralized finance.

Time series refers to a sequence of data points collected or recorded over a specific period at equally spaced intervals. These data points are typically ordered chronologically, allowing for the analysis of patterns, trends, and behaviors over time. Time series analysis is a fundamental method in various fields such as economics, finance, weather forecasting, and signal processing. It enables the identification of temporal patterns, seasonality, and anomalies within the data, facilitating predictions and decision-making based on historical trends. Time series data often involves studying how a particular variable changes over time, providing valuable insights into the underlying dynamics of a system or phenomenon.

1.1 Objective

To forecast the bitcoin crypto dataset for the next few months by using two types of split data. The first one by splitting the test for a year, and the second one by splitting it for only half-a-year.

2 Preparation

The First step is inserting the csv file into R located in data_input and then installing the necessary plugins including dplyr, lubridate, padr, etc.

# Read data csv
crypto <- read.csv("crypto-markets.csv")

# Load libraries for unsupervised machine learning
library(dplyr)     # Data manipulation and transformation
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)  # Date and time manipulation
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(padr)       # Padding and filling missing time series data
## Warning: package 'padr' was built under R version 4.3.2
library(zoo)        # Time series data manipulation
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(forecast)   # Time series forecasting
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(TTR)        # Technical Trading Rules
## Warning: package 'TTR' was built under R version 4.3.2
library(MLmetrics)  # Machine learning evaluation metrics
## 
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
## 
##     Recall
library(tseries)    # Time series analysis
## Warning: package 'tseries' was built under R version 4.3.2
library(fpp)        # Forecasting principles and practice
## Warning: package 'fpp' was built under R version 4.3.2
## Loading required package: fma
## Warning: package 'fma' was built under R version 4.3.2
## Loading required package: expsmooth
## Warning: package 'expsmooth' was built under R version 4.3.2
## Loading required package: lmtest
library(TSstudio)   # Time series visualization
## Warning: package 'TSstudio' was built under R version 4.3.2
library(ggplot2)    # Data visualization
## Warning: package 'ggplot2' was built under R version 4.3.2
library(plotly)     # Interactive plots
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)      # Data tidying
library(glue)       # String manipulation

Next, we will observe our data set which we exported from the csv.

head(crypto)

We will also observe the glimpse() to check all the columns.

glimpse(crypto)
## Rows: 942,297
## Columns: 13
## $ slug        <chr> "bitcoin", "bitcoin", "bitcoin", "bitcoin", "bitcoin", "bi…
## $ symbol      <chr> "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "B…
## $ name        <chr> "Bitcoin", "Bitcoin", "Bitcoin", "Bitcoin", "Bitcoin", "Bi…
## $ date        <chr> "2013-04-28", "2013-04-29", "2013-04-30", "2013-05-01", "2…
## $ ranknow     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ open        <dbl> 135.30, 134.44, 144.00, 139.00, 116.38, 106.25, 98.10, 112…
## $ high        <dbl> 135.98, 147.49, 146.93, 139.89, 125.60, 108.13, 115.00, 11…
## $ low         <dbl> 132.10, 134.00, 134.05, 107.72, 92.28, 79.10, 92.50, 107.1…
## $ close       <dbl> 134.21, 144.54, 139.00, 116.99, 105.21, 97.75, 112.50, 115…
## $ volume      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ market      <dbl> 1488566728, 1603768865, 1542813125, 1298954594, 1168517495…
## $ close_ratio <dbl> 0.5438, 0.7813, 0.3843, 0.2882, 0.3881, 0.6424, 0.8889, 0.…
## $ spread      <dbl> 3.88, 13.49, 12.88, 32.17, 33.32, 29.03, 22.50, 11.66, 18.…

Checking if there are any NA in dataset

colSums(is.na(crypto))
##        slug      symbol        name        date     ranknow        open 
##           0           0           0           0           0           0 
##        high         low       close      volume      market close_ratio 
##           0           0           0           0           0           0 
##      spread 
##           0

Filter to only bitcoin and the date and closing price

bitcoin <- crypto %>% 
  filter(slug=="bitcoin") %>% # filter to only bitcoin
  select(c(date, close)) # filter to only date and close column
# Converting date column into date
bitcoin <- bitcoin %>%
  mutate(date = as.Date(date))
# Checking whether it needs padding or not
bitcoin %>%
  arrange() %>%
  pad() %>%
  anyNA() 
## pad applied on the interval: day
## [1] FALSE
tail(bitcoin)
bitcoin_ts <- ts(data = bitcoin$close,
   frequency = 365)
bitcoin_decom <- decompose(bitcoin_ts)
autoplot(bitcoin_decom)

From the plot above, it is discovered that there is a presence of both trend that goes upward over the time and a seasonal. Therefore, we will use a Triple Exponential Smoothing.

3 Seasonality Analysis

bitcoin %>%
ggplot(aes(date, close)) +
  geom_line() +
  theme_minimal()

Observing the line plot, it becomes evident that Bitcoin exhibits a predominantly upward trajectory, steadily ascending until it culminated at its peak in 2018. Subsequently, the market underwent a bearish phase that persisted until the conclusion of the provided dataset, depicting a notable shift in market dynamics and emphasizing the downward trend post-2018.

# Yearly Dataset
bitcoin_y <- bitcoin %>% 
  mutate(month = month(date, label = TRUE),      # ekstraksi bulan
         seasonal = bitcoin_decom$seasonal
      ) %>% # ekstraksi seasonality
  distinct(month, seasonal) %>% # mengambil nilai unik di 2 kolom
  group_by(month) %>%
  summarise(seasonal = mean(seasonal)) %>% 
  mutate( 
        label = glue("Month: {month}
                 Seasonal: {seasonal}"))
plot_y <- ggplot(bitcoin_y, aes(x=month, y=seasonal))+
  geom_col(fill = "lightgreen", aes(text = label))+
  scale_fill_gradient() +
  labs(title = "Yearly Seasonal Distribution",
       x = NULL,
       y = "Seasonal Value") +
  theme_minimal()
## Warning in geom_col(fill = "lightgreen", aes(text = label)): Ignoring unknown
## aesthetics: text
ggplotly(plot_y, tooltip = "text")

Analyzing the depicted graph, it is evident that the peak seasonal values occur at the onset and conclusion of the year, reaching a maximum of 1499.29 in December, succeeded closely by January with a recorded value of 989.58. In contrast, the trough in seasonal values is observed in September, plummeting to a low of -572.22. This indicates a distinct seasonal pattern, characterized by notable highs in December and January, and a significant dip in September.

4 Model Fitting and Analysis

4.1 One Year Test

# Using a year as test
data_test <- tail(bitcoin_ts, 365) # Using one week as Testing
data_train <- head(bitcoin_ts, length(bitcoin_ts)-365)

4.1.1 Exponential Smoothing - Triple Exponential Smoothing

# Modeling Triple Exponential Smoothing and Additive seasonal
data_es <- HoltWinters(x = data_train,seasonal = "additive") 

# Forecasting one week after data cut-off
data_forecast_es <- forecast(data_es, 365)

# Checking accuracy
MAE(data_forecast_es$mean,data_test)
## [1] 12369.12
# Plotting Visualization
plot(data_forecast_es)

4.1.2 ARIMA

# Modeling with ARIMA
data_arima <- stlm(data_train, method = "arima")

# Forecasting one week after data cut-off
data_forecast_arima <- forecast(data_arima, 365)

# Checking accuracy
MAE(data_forecast_arima$mean,data_test)
## [1] 9711.51
# Plotting Visualization
plot(data_forecast_es)

### ETS

# Modeling with ETS
data_ets <- stlm(data_train, method = "ets")

# Forecasting one week after data cut-off
data_forecast_ets <- forecast(data_ets, 365)

# Checking accuracy
MAE(data_forecast_ets$mean,data_test)
## [1] 10216.71
# Plotting Visualization
plot(data_forecast_ets)

## Half-A-Year Test

# Using half-a-year as test
data_test_1 <- tail(bitcoin_ts, 182) # Using one week as Testing
data_train_1 <- head(bitcoin_ts, length(bitcoin_ts)-182)

4.1.3 Exponential Smoothing - Triple Exponential Smoothing

# Modeling Triple Exponential Smoothing and Additive seasonal
data_es_1 <- HoltWinters(x = data_train_1,seasonal = "additive") 

# Forecasting one week after data cut-off
data_forecast_es_1 <- forecast(data_es_1, 182)

# Checking accuracy
MAE(data_forecast_es_1$mean,data_test)
## [1] 1015.164
# Plotting Visualization
plot(data_forecast_es_1)

4.1.4 ARIMA

# Modeling with ARIMA
data_arima_1 <- stlm(data_train_1, method = "arima")

# Forecasting one week after data cut-off
data_forecast_arima_1 <- forecast(data_arima_1, 182)

# Checking accuracy
MAE(data_forecast_arima_1$mean,data_test)
## [1] 1728.625
# Plotting Visualization
plot(data_forecast_arima_1)

4.1.5 ETS

# Modeling with ETS
data_ets_1 <- stlm(data_train_1, method = "ets")

# Forecasting one week after data cut-off
data_forecast_ets_1 <- forecast(data_ets_1, 365)

# Checking accuracy
MAE(data_forecast_ets_1$mean,data_test)
## [1] 1714.111
# Plotting Visualization
plot(data_forecast_ets_1)

# Prediction Performance

4.2 One Year Test

In the evaluation of cryptocurrency forecasting models based on Mean Absolute Error (MAE) for a one-year data test, the results indicate that Autoregressive Integrated Moving Average (ARIMA) outperformed the other models. ARIMA yielded the lowest MAE of 9711.51, suggesting higher accuracy in predicting cryptocurrency prices during the specified period. Following ARIMA, Error-Trend-Seasonality (ETS) had a MAE of 10216.71, while Exponential Smoothing (ES) exhibited the highest MAE at 12369.12.

In summary, the ranking from best to least accurate based on MAE is as follows:

  • ARIMA (MAE = 9711.51)
  • ETS (MAE = 10216.71)
  • Exponential Smoothing (ES) (MAE = 12369.12)

4.3 Half-A-Year Test

The models are ranked based on their MAE values, with lower MAE indicating greater accuracy. In this instance, Exponential Smoothing (ES) exhibited the lowest MAE, suggesting it performed the best in predicting cryptocurrency prices during the specified half-year period. Following ES, Error-Trend-Seasonality (ETS) and Autoregressive Integrated Moving Average (ARIMA) had higher MAE values, with ETS showing slightly better performance than ARIMA.

To summarize the performance ranking:

  • ES (MAE = 1015.164)
  • ETS (MAE = 1714.111)
  • ARIMA (MAE = 1728.625)

5 Conclusion

In the assessment of cryptocurrency forecasting models over one year, Autoregressive Integrated Moving Average (ARIMA) demonstrated superior performance with the lowest Mean Absolute Error (MAE) of 9711.51, indicating higher accuracy in predicting prices. However, during a half-year test, Exponential Smoothing (ES) exhibited the best performance with the lowest MAE of 1015.164, suggesting it as the preferable model for shorter-term predictions. Therefore, the choice of model depends on the forecasting horizon, with ARIMA favored for longer-term predictions and ES recommended for shorter-term forecasts based on their respective MAE performances.Therefore, going forward we will use the MAE of ES from Half-A-Year Test.

# No Autocorrelation Test
Box.test(data_forecast_es_1$residuals)
## 
##  Box-Pierce test
## 
## data:  data_forecast_es_1$residuals
## X-squared = 11.299, df = 1, p-value = 0.0007753

The Box-Pierce test, applied to the residuals of data_forecast_es_1, reveals a significant chi-squared statistic of 11.299 with 1 degree of freedom and a p-value of 0.0007753, indicating a substantial correlation in the residuals. This rejects the null hypothesis and suggests that autocorrelation is present in the data.

# Normality Test
shapiro.test(data_forecast_es_1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_forecast_es_1$residuals
## W = 0.51593, p-value < 2.2e-16

The Shapiro-Wilk normality test conducted on the residuals of data_forecast_es_1 demonstrates a W statistic of 0.51593 and an extremely low p-value (< 2.2e-16), indicating a departure from normal distribution. Therefore, the residuals do not exhibit a normal spread based on the results of the Shapiro-Wilk test.